A Space-Efficient Phrase Table Implementation Using Minimal Perfect Hash Functions

نویسنده

  • Marcin Junczys-Dowmunt
چکیده

We describe the structure of a space-efficient phrase table for phrasebased statistical machine translation with the Moses decoder. The new phrase table can be used in-memory or be partially mapped on-disk. Compared to the standard Moses on-disk phrase table implementation a size reduction by a factor of 6 is achieved. The focus of this work lies on the source phrase index which is implemented using minimal perfect hash functions. Two methods are discussed that reduce the memory consumption of a baseline implementation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding Succinct Ordered Minimal Perfect Hash Functions

An ordered minimal perfect hash table is one in which no collisions occur among a prede ned set of keys, no space is unused and the data are placed in the table in order. A new method for creating ordered minimal perfect hash functions is presented. It creates hash functions with representation space requirements closer to the theoretical lower bound than previous methods. The method presented ...

متن کامل

Monotone minimal perfect hashing: searching a sorted table with O(1) accesses

A minimal perfect hash function maps a set S of n keys into the set { 0, 1, . . . , n− 1 } bijectively. Classical results state that minimal perfect hashing is possible in constant time using a structure occupying space close to the lower bound of log e bits per element. Here we consider the problem of monotone minimal perfect hashing, in which the bijection is required to preserve the lexicogr...

متن کامل

A New Algorithm for Constructing Minimal Perfect Hash Functions

We present a three-step algorithm for generating minimal perfect hash functions which runs very fast in practice. The first step is probabilistic and involves the generation of random graphs. The second step determines the order in which hash values are assigned to keys. The third step assigns hash values to the keys. We give strong evidences that first step takes linear random time and the sec...

متن کامل

Indexing Internal Memory with Minimal Perfect Hash Functions

A perfect hash function (PHF) is an injective function that maps keys from a set S to unique values, which are in turn used to index a hash table. Since no collisions occur, each key can be retrieved from the table with a single probe. A minimal perfect hash function (MPHF) is a PHF with the smallest possible range, that is, the hash table size is exactly the number of keys in S. MPHFs are wide...

متن کامل

A Survey on Efficient Hashing Techniques in Software Configuration Management

This paper presents a survey on efficient hashing techniques in software configuration management scenarios. Therefore it introduces in the most important hashing techniques as open hashing, separate chaining and minimal perfect hashing. Furthermore we evaluate those hashing techniques utilizing large data sets. Therefore we compare the hash functions in terms of time to build the data structur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012